Visualising and modelling changes in categorical variables in longitudinal studies
نویسندگان
چکیده
BACKGROUND Graphical techniques can provide visually compelling insights into complex data patterns. In this paper we present a type of lasagne plot showing changes in categorical variables for participants measured at regular intervals over time and propose statistical models to estimate distributions of marginal and transitional probabilities. METHODS The plot uses stacked bars to show the distribution of categorical variables at each time interval, with different colours to depict different categories and changes in colours showing trajectories of participants over time. The models are based on nominal logistic regression which is appropriate for both ordinal and nominal categorical variables. To illustrate the plots and models we analyse data on smoking status, body mass index (BMI) and physical activity level from a longitudinal study on women's health. To estimate marginal distributions we fit survey wave as an explanatory variable whereas for transitional distributions we fit status of participants (e.g. smoking status) at previous surveys. RESULTS For the illustrative data the marginal models showed BMI increasing, physical activity decreasing and smoking decreasing linearly over time at the population level. The plots and transition models showed smoking status to be highly predictable for individuals whereas BMI was only moderately predictable and physical activity was virtually unpredictable. Most of the predictive power was obtained from participant status at the previous survey. Predicted probabilities from the models mostly agreed with observed probabilities indicating adequate goodness-of-fit. CONCLUSIONS The proposed form of lasagne plot provides a simple visual aid to show transitions in categorical variables over time in longitudinal studies. The suggested models complement the plot and allow formal testing and estimation of marginal and transitional distributions. These simple tools can provide valuable insights into categorical data on individuals measured at regular intervals over time.
منابع مشابه
Analysis of Dynamic Longitudinal Categorical Data in Incomplete Contingency Tables Using Capture-Recapture Sampling: A case Study of Semi-Concentrated Doctoral Exam
Abstract. In this paper, dynamic longitudinal categorical data and estimation of their parameters in incomplete contingency tables are evaluated. To apply the proposed method, a study has been conducted on the data of the semi-concentrated doctoral exam of the National Organization for Educational Testing (NOET). The results of studies such as the obtained confidence intervals and calculating t...
متن کاملTransition Models for Analyzing Longitudinal Data with Bivariate Mixed Ordinal and Nominal Responses
In many longitudinal studies, nominal and ordinal mixed bivariate responses are measured. In these studies, the aim is to investigate the effects of explanatory variables on these time-related responses. A regression analysis for these types of data must allow for the correlation among responses during the time. To analyze such ordinal-nominal responses, using a proposed weighting approach, an ...
متن کاملسری آمار: تحلیل جداول توافقی 2 (شاخصهای بررسی رابطه)
The P-Value cannot present a complete measure of association in medical studies considering the association between categorical variables. In such situations, measures are required to reveal the clinical importance of relation along with their statistical significance, as the effect size. This paper aims to introduce the measures of associations for categorical variables and inferences ab...
متن کاملExtension of Logic regression to Longitudinal data: Transition Logic Regression
Logic regression is a generalized regression and classification method that is able to make Boolean combinations as new predictive variables from the original binary variables. Logic regression was introduced for case control or cohort study with independent observations. Although in various studies, correlated observations occur due to different reasons, logic regression have not been studi...
متن کاملVisualising disease progression on multiple variables with vector plots and path plots
BACKGROUND It is often desirable to observe how a disease progresses over time in individual patients, rather than graphing group averages; and since multiple outcomes are typically recorded on each patient, it would be advantageous to visualise disease progression on multiple variables simultaneously. METHODS A variety of vector plots and a path plot have been developed for this purpose, and...
متن کامل